Skip to content

RSPEED-2849: add endpoint label to LLM Prometheus metrics#1624

Open
Lifto wants to merge 4 commits intolightspeed-core:mainfrom
Lifto:feat/rspeed-2849-llm-metrics
Open

RSPEED-2849: add endpoint label to LLM Prometheus metrics#1624
Lifto wants to merge 4 commits intolightspeed-core:mainfrom
Lifto:feat/rspeed-2849-llm-metrics

Conversation

@Lifto
Copy link
Copy Markdown
Contributor

@Lifto Lifto commented Apr 28, 2026

Summary

  • Adds an endpoint label to all 5 LLM Prometheus metrics (llm_calls_total, llm_calls_failures_total, llm_calls_validation_errors_total, llm_token_sent_total, llm_token_received_total) to enable per-endpoint traffic differentiation in Grafana
  • Threads endpoint_path: str through all call chains from the 4 endpoint handlers (/v1/responses, /v1/infer, /v1/streaming_query, /v1/query) down to the metric increment sites
  • Updates unit tests for _increment_llm_call_metric, extract_token_usage, and run_shield_moderation

Why

RSPEED-2849 — enables CLA vs Goose traffic differentiation in Grafana dashboards by labeling each LLM metric with the originating API endpoint.

Cardinality

The endpoint label has exactly 4 bounded values:

  • /v1/infer
  • /v1/responses
  • /v1/streaming_query
  • /v1/query

No cardinality explosion risk.

Stacking

This PR stacks on #1620 (feat/rspeed-2849-user-agent). Please merge #1620 first.

Summary by CodeRabbit

Release Notes

  • Improvements

    • Enhanced monitoring and observability with endpoint-level metric tracking for better service visibility and debugging
    • Added User-Agent capture in telemetry events for improved request attribution and tracing
  • Documentation

    • Updated contribution guidelines with PR title requirements and approved prefix format specifications

Lifto added 4 commits April 28, 2026 13:55
…se differentiation

Adds a sanitized user_agent field to ResponsesEventData and the Splunk
event payload, enabling differentiation between Goose and other clients
in telemetry. Extracts and sanitizes the User-Agent header (strips control
characters, truncates to 128 chars) before storing.

Closes RSPEED-2849
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Walkthrough

The pull request adds endpoint path tracking and user-agent handling across the API layer. It threads a constant endpoint_path through moderation, metrics, and response pipelines in four endpoint handlers, updates Prometheus metric definitions to include endpoint labels, adds user-agent extraction and sanitization in the responses module, and extends the observability event model to capture user-agent data.

Changes

Cohort / File(s) Summary
Documentation & CI Configuration
AGENTS.md
Added CI rule requiring pull request titles to begin with an approved JIRA issue key prefix.
Metrics Infrastructure
src/metrics/__init__.py
Updated five Prometheus Counter metrics to include "endpoint" label: llm_calls_total, llm_calls_failures_total, llm_calls_validation_errors_total, llm_token_sent_total, llm_token_received_total.
Core Utilities - Token Usage & Metrics
src/utils/responses.py
Extended extract_token_usage, _increment_llm_call_metric, and build_turn_summary to accept and propagate endpoint_path parameter for metric labeling.
Core Utilities - Shield Moderation
src/utils/shields.py
Updated run_shield_moderation to accept endpoint_path and label validation error metrics with it.
Endpoint Handler - User-Agent & Endpoint Path
src/app/endpoints/responses.py
Added _get_user_agent() helper for sanitization and truncation; threaded user_agent and endpoint_path through streaming/non-streaming response paths, Splunk telemetry, and token usage extraction.
Endpoint Handlers - Endpoint Path Threading
src/app/endpoints/query.py, src/app/endpoints/streaming_query.py, src/app/endpoints/rlsapi_v1.py
Added constant endpoint_path definitions and threaded through moderation, metrics labeling, token usage extraction, and response summary building.
Observability - Response Events
src/observability/formats/responses.py
Added optional user_agent field to ResponsesEventData and included it in Splunk event payload emission.
Unit Tests - Utilities
tests/unit/utils/test_responses.py, tests/unit/utils/test_shields.py
Updated test calls to pass endpoint_path argument and adjusted metric label assertions to verify endpoint labeling.
Unit Tests - Endpoints & Observability
tests/unit/app/endpoints/test_responses_splunk.py, tests/unit/observability/formats/test_responses.py, tests/unit/app/endpoints/test_metrics.py
Added TestGetUserAgent test class with five tests for user-agent extraction/sanitization; added three tests for user_agent field in response events; removed assertion for Prometheus created gauge.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • are-ces
  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an endpoint label to LLM Prometheus metrics, which is the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/app/endpoints/streaming_query.py (1)

330-335: 🛠️ Refactor suggestion | 🟠 Major

Docstrings need the new endpoint_path parameter.

Both updated function signatures include endpoint_path, but their Args sections do not document it.

Proposed docstring update
     Args:
         responses_params: The Responses API parameters
         context: The response generator context
+        endpoint_path: API endpoint path used for metric labeling.
@@
     Args:
         turn_response: The streaming response from Llama Stack
         context: The response generator context
         turn_summary: TurnSummary to populate during streaming
+        endpoint_path: API endpoint path used for metric labeling.

As per coding guidelines, "All functions require docstrings with brief descriptions and complete type annotations for parameters and return types".

Also applies to: 700-704

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/streaming_query.py` around lines 330 - 335, Two functions
in this module now accept the endpoint_path parameter but their docstrings were
not updated; add an Args entry for endpoint_path to each affected function's
docstring describing it and its type (e.g., endpoint_path: str — the request
endpoint path used by the response generator), keep the existing parameter order
and type annotations intact, and ensure the Args section format matches the
other parameters (including the tuple[AsyncIterator[str], TurnSummary] return
description) for both functions whose signatures were changed to include
endpoint_path.
src/utils/shields.py (1)

74-83: ⚠️ Potential issue | 🟠 Major

detect_shield_violations uses unlabeled .inc() on a labeled Counter at line 80.

The metric llm_calls_validation_errors_total requires an endpoint label (defined in src/metrics/__init__.py line 44), but line 80 calls .inc() without labels. This causes a Prometheus ValueError at runtime when the code path is executed. The function must receive endpoint_path and call labels(endpoint_path).inc() to match the pattern used correctly in run_shield_moderation (line 183).

Proposed fix
-def detect_shield_violations(output_items: list[Any]) -> bool:
+def detect_shield_violations(output_items: list[Any], endpoint_path: str) -> bool:
@@
-                metrics.llm_calls_validation_errors_total.inc()
+                metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc()

Update the function docstring to document the new endpoint_path parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/shields.py` around lines 74 - 83, The detect_shield_violations
function is incrementing a labeled Counter with .inc() without labels causing a
Prometheus ValueError; add an endpoint_path parameter to
detect_shield_violations, update its docstring to document endpoint_path, and
replace metrics.llm_calls_validation_errors_total.inc() with
metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc() so the
metric is incremented with the required endpoint label (referencing
detect_shield_violations and metrics.llm_calls_validation_errors_total).
src/utils/responses.py (1)

1450-1499: ⚠️ Potential issue | 🟠 Major

Keep build_turn_summary() free of metric side effects.

This helper still calls extract_token_usage(), so it increments llm_calls_total and token counters while building a summary. Callers in src/app/endpoints/responses.py (Lines 1079-1081) and src/app/endpoints/rlsapi_v1.py (Line 722) already extracted usage earlier, so those paths are counted twice; the non-streaming blocked /v1/responses path also turns a synthetic zero-usage response into a real LLM call.

♻️ Suggested direction
 def build_turn_summary(  # pylint: disable=too-many-arguments,too-many-positional-arguments
     response: Optional[OpenAIResponseObject],
     model: str,
     endpoint_path: str,
     vector_store_ids: Optional[list[str]] = None,
     rag_id_mapping: Optional[dict[str, str]] = None,
     filter_server_tools: bool = False,
+    token_usage: Optional[TokenCounter] = None,
 ) -> TurnSummary:
@@
-    summary.rag_chunks = parse_rag_chunks(response, vector_store_ids, rag_id_mapping)
-    summary.token_usage = extract_token_usage(response.usage, model, endpoint_path)
+    summary.rag_chunks = parse_rag_chunks(response, vector_store_ids, rag_id_mapping)
+    summary.token_usage = token_usage or extract_token_usage(
+        response.usage, model, endpoint_path
+    )
     return summary

Then pass the already-computed token_usage from call sites that need it before summary construction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1450 - 1499, build_turn_summary
currently has a side effect of calling extract_token_usage (incrementing
metrics); remove that call and add an explicit token_usage parameter (e.g.,
token_usage: Optional[TokenUsage] = None) so callers supply the already-computed
usage instead of rebuilding it; update call sites that previously relied on
build_turn_summary (notably the handlers in src/app/endpoints/responses.py and
src/app/endpoints/rlsapi_v1.py) to compute token_usage once via
extract_token_usage(response.usage, model, endpoint_path) and pass it into
build_turn_summary, and ensure build_turn_summary simply assigns the provided
token_usage to summary.token_usage without calling extract_token_usage itself.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 318-323: The function signature for retrieve_response_generator
currently uses an empty-string default for endpoint_path which can emit
accidental endpoint="" metrics; change the signature to require endpoint_path:
str (remove the = "" default) in retrieve_response_generator and the other
similar function at the second occurrence (lines ~688-693), then update all call
sites to pass a concrete endpoint_path value and adjust any tests or callers
that relied on the default; also update related docstrings/comments to reflect
the parameter is required.

In `@src/metrics/__init__.py`:
- Around line 31-58: The endpoint label values used by the Prometheus counters
llm_calls_total, llm_calls_failures_total, llm_calls_validation_errors_total,
llm_token_sent_total and llm_token_received_total must be centralized: define a
shared set/enum/Literal of allowed endpoint strings in constants.py (or reuse
existing ones there) and replace any hardcoded/inline endpoint values at each
metric emission site to reference that central constant to prevent new series
from typos/empties; update any code that constructs the label value for the
"endpoint" label to validate/normalize against that central constant set and
fall back to a single explicit "unknown" constant if needed.

---

Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 330-335: Two functions in this module now accept the endpoint_path
parameter but their docstrings were not updated; add an Args entry for
endpoint_path to each affected function's docstring describing it and its type
(e.g., endpoint_path: str — the request endpoint path used by the response
generator), keep the existing parameter order and type annotations intact, and
ensure the Args section format matches the other parameters (including the
tuple[AsyncIterator[str], TurnSummary] return description) for both functions
whose signatures were changed to include endpoint_path.

In `@src/utils/responses.py`:
- Around line 1450-1499: build_turn_summary currently has a side effect of
calling extract_token_usage (incrementing metrics); remove that call and add an
explicit token_usage parameter (e.g., token_usage: Optional[TokenUsage] = None)
so callers supply the already-computed usage instead of rebuilding it; update
call sites that previously relied on build_turn_summary (notably the handlers in
src/app/endpoints/responses.py and src/app/endpoints/rlsapi_v1.py) to compute
token_usage once via extract_token_usage(response.usage, model, endpoint_path)
and pass it into build_turn_summary, and ensure build_turn_summary simply
assigns the provided token_usage to summary.token_usage without calling
extract_token_usage itself.

In `@src/utils/shields.py`:
- Around line 74-83: The detect_shield_violations function is incrementing a
labeled Counter with .inc() without labels causing a Prometheus ValueError; add
an endpoint_path parameter to detect_shield_violations, update its docstring to
document endpoint_path, and replace
metrics.llm_calls_validation_errors_total.inc() with
metrics.llm_calls_validation_errors_total.labels(endpoint_path).inc() so the
metric is incremented with the required endpoint label (referencing
detect_shield_violations and metrics.llm_calls_validation_errors_total).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 98aed2a2-7c3d-43f5-9f88-cb41d554dc91

📥 Commits

Reviewing files that changed from the base of the PR and between 93b2bb5 and 8ae0d65.

📒 Files selected for processing (14)
  • AGENTS.md
  • src/app/endpoints/query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/streaming_query.py
  • src/metrics/__init__.py
  • src/observability/formats/responses.py
  • src/utils/responses.py
  • src/utils/shields.py
  • tests/unit/app/endpoints/test_metrics.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/observability/formats/test_responses.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_metrics.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: build-pr
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)

Import FastAPI dependencies from fastapi module (e.g., APIRouter, HTTPException, Request, status, Depends)

Import Llama Stack client as from llama_stack_client import AsyncLlamaStackClient

All modules must start with descriptive docstrings explaining purpose

Use logger = get_logger(__name__) from log.py for module logging

Use Final[type] as type hint for all constants

Use @field_validator and @model_validator for custom validation in Pydantic models

All functions require docstrings with brief descriptions and complete type annotations for parameters and return types

Use typing_extensions.Self type hint for model validators

Use modern Union syntax str | int instead of Union[str, int] for type hints

Use Optional[Type] for nullable types in function signatures

Use snake_case naming with descriptive, action-oriented function names (e.g., get_, validate_, check_)

Avoid in-place parameter modification anti-patterns - return new data structures instead of modifying input parameters

Use async def for I/O operations and external API calls in Python functions

Handle APIConnectionError from Llama Stack in error handling logic

Use standard log levels with clear purposes: debug() for diagnostics, info() for general execution, warning() for unexpected events, error() for serious problems

All classes require descriptive docstrings explaining purpose

Use PascalCase class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface

Pydantic configuration classes must extend ConfigurationBase, data models must extend BaseModel

Abstract classes must use ABC with @abstractmethod decorators for interface definitions

Pydantic models must use @model_validator and @field_validator for validation logic

Use complete type annotations for all class attributes with specific types...

Files:

  • src/observability/formats/responses.py
  • src/utils/shields.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/metrics/__init__.py
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • src/app/endpoints/rlsapi_v1.py
AGENTS.md

📄 CodeRabbit inference engine (CLAUDE.md)

Document agent implementations and their configurations in AGENTS.md

Files:

  • AGENTS.md
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest for all unit and integration tests, do not use unittest framework

Use pytest-mock for AsyncMock objects in test files

Use pytest.mark.asyncio marker for asynchronous test functions

Files:

  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/observability/formats/test_responses.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_shields.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files must contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always check `pyproject.toml` for existing dependencies before adding new ones
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always verify current library versions in `pyproject.toml` rather than assuming versions
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Check `constants.py` for shared constants before defining new ones
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Unit tests require 60% code coverage, integration tests require 10% coverage
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Run `uv run make format` to auto-format code with black and ruff before completion
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Run `uv run make verify` to execute all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Create unit tests for all new code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Ensure all tests pass before submission
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use pylint with `source-roots = "src"` configuration for static analysis
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Never commit secrets/keys, use environment variables for sensitive data
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use bandit for security issue detection in code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: PR titles MUST start with a JIRA issue key prefix (LCORE-, RSPEED-, MGTM-, OLS-, RHDHPAI-, LEADS-)
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Use `uv sync --group dev --group llslibdev` for setting up development dependencies
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Always use `uv run` prefix for executing commands in the project
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack

Timestamp: 2026-04-28T21:14:00.744Z
Learning: Follow existing code patterns in the module you're modifying
📚 Learning: 2026-03-02T16:38:30.287Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T16:38:30.287Z
Learning: Applies to AGENTS.md : Document agent implementations and their configurations in AGENTS.md

Applied to files:

  • AGENTS.md
📚 Learning: 2026-03-02T15:57:10.746Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-stack PR: 1250
File: AGENTS.md:11-12
Timestamp: 2026-03-02T15:57:10.746Z
Learning: In AGENTS.md, Linting Tools section should list actual tool names (e.g., mypy, pylint, pyright, ruff, black) instead of make target names (e.g., check-types). Ensure the section documents the individual tools that perform linting, not the Makefile targets that invoke them, so readers understand which tools are used and how to configure/run them directly.

Applied to files:

  • AGENTS.md
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import FastAPI dependencies with: `from fastapi import APIRouter, HTTPException, Request, status, Depends`

Applied to files:

  • src/app/endpoints/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
🪛 LanguageTool
AGENTS.md

[uncategorized] ~198-~198: The official name of this software platform is spelled with a capital “H”.
Context: ...es this via pr-title-checker (config: .github/pr-title-checker-config.json). Allowe...

(GITHUB)

🔇 Additional comments (7)
AGENTS.md (1)

196-204: Nice addition for CI title enforcement visibility.

This is clear and gives contributors concrete pass/fail examples for PR naming.

src/utils/shields.py (1)

122-126: Good propagation of endpoint label into shield moderation metrics.

The new endpoint_path threading and labeled increment in the blocked branch are aligned with the endpoint-level observability objective.

Also applies to: 181-184

tests/unit/app/endpoints/test_responses_splunk.py (1)

721-773: User-Agent sanitization tests are well-targeted.

These cases cover the important behavioral edges (empty, control chars, truncation, valid pass-through).

tests/unit/utils/test_responses.py (1)

2232-2300: Good update of metric tests for endpoint-labeled counters.

The call signatures and label assertions are consistent with the new endpoint dimension.

Also applies to: 2553-2596

tests/unit/utils/test_shields.py (1)

121-123: Nice synchronization of shield tests with endpoint-labeled metrics.

The updated call signatures and label assertion reduce drift with the production metric behavior.

Also applies to: 152-154, 194-201, 234-236, 265-267, 331-333, 349-350, 382-383

src/observability/formats/responses.py (1)

27-27: Telemetry schema and payload mapping for user_agent look good.

Defaulting to None keeps compatibility while enabling the new field.

Also applies to: 49-49

tests/unit/observability/formats/test_responses.py (1)

23-35: Good coverage for user_agent defaulting and payload emission.

These tests validate both explicit and omitted-field behavior cleanly.

Also applies to: 117-163

Comment on lines 318 to 323
async def retrieve_response_generator(
responses_params: ResponsesApiParams,
context: ResponseGeneratorContext,
endpoint_path: str = "",
) -> tuple[AsyncIterator[str], TurnSummary]:
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make endpoint_path required to avoid accidental empty-label metrics.

Using endpoint_path: str = "" can silently emit an extra endpoint="" series if a caller omits the argument. That breaks the intended bounded endpoint set.

Proposed fix
 async def retrieve_response_generator(
     responses_params: ResponsesApiParams,
     context: ResponseGeneratorContext,
-    endpoint_path: str = "",
+    endpoint_path: str,
 ) -> tuple[AsyncIterator[str], TurnSummary]:
@@
 async def response_generator(  # pylint: disable=too-many-branches,too-many-statements,too-many-locals
     turn_response: AsyncIterator[OpenAIResponseObjectStream],
     context: ResponseGeneratorContext,
     turn_summary: TurnSummary,
-    endpoint_path: str = "",
+    endpoint_path: str,
 ) -> AsyncIterator[str]:

Also applies to: 688-693

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/streaming_query.py` around lines 318 - 323, The function
signature for retrieve_response_generator currently uses an empty-string default
for endpoint_path which can emit accidental endpoint="" metrics; change the
signature to require endpoint_path: str (remove the = "" default) in
retrieve_response_generator and the other similar function at the second
occurrence (lines ~688-693), then update all call sites to pass a concrete
endpoint_path value and adjust any tests or callers that relied on the default;
also update related docstrings/comments to reflect the parameter is required.

Comment thread src/metrics/__init__.py
Comment on lines 31 to +58
llm_calls_total = Counter(
"ls_llm_calls_total", "LLM calls counter", ["provider", "model"]
"ls_llm_calls_total", "LLM calls counter", ["provider", "model", "endpoint"]
)

# Metric that counts how many LLM calls failed
llm_calls_failures_total = Counter(
"ls_llm_calls_failures_total", "LLM calls failures", ["provider", "model"]
"ls_llm_calls_failures_total",
"LLM calls failures",
["provider", "model", "endpoint"],
)

# Metric that counts how many LLM calls had validation errors
llm_calls_validation_errors_total = Counter(
"ls_llm_validation_errors_total", "LLM validation errors"
"ls_llm_validation_errors_total", "LLM validation errors", ["endpoint"]
)

# TODO(lucasagomes): Add metric for token usage
# https://issues.redhat.com/browse/LCORE-411
llm_token_sent_total = Counter(
"ls_llm_token_sent_total", "LLM tokens sent", ["provider", "model"]
"ls_llm_token_sent_total", "LLM tokens sent", ["provider", "model", "endpoint"]
)

# TODO(lucasagomes): Add metric for token usage
# https://issues.redhat.com/browse/LCORE-411
llm_token_received_total = Counter(
"ls_llm_token_received_total", "LLM tokens received", ["provider", "model"]
"ls_llm_token_received_total",
"LLM tokens received",
["provider", "model", "endpoint"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Centralize the allowed endpoint label values.

These counters now depend on endpoint strings supplied from several call chains, so a typo or empty fallback will create a new Prometheus series silently. Please move the four supported endpoint paths behind shared constants (or an enum/Literal) and reuse them at every metric call site to preserve the bounded-cardinality guarantee.

Based on learnings: Check constants.py for shared constants before defining new ones.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/metrics/__init__.py` around lines 31 - 58, The endpoint label values used
by the Prometheus counters llm_calls_total, llm_calls_failures_total,
llm_calls_validation_errors_total, llm_token_sent_total and
llm_token_received_total must be centralized: define a shared set/enum/Literal
of allowed endpoint strings in constants.py (or reuse existing ones there) and
replace any hardcoded/inline endpoint values at each metric emission site to
reference that central constant to prevent new series from typos/empties; update
any code that constructs the label value for the "endpoint" label to
validate/normalize against that central constant set and fall back to a single
explicit "unknown" constant if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant